Data head
| PANEL | REGION | AGE31X | GENDER | RACE3 | MARRY31X | EDRECODE | FTSTU31X | ACTDTY31 | HONRDC31 | ... | PCS42 | MCS42 | K6SUM42 | PHQ242 | EMPST31 | POVCAT15 | INSCOV15 | INCOME_M | HEALTHEXP | PERSONWT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19 | 2 | 52 | 0.0 | 0.0 | 5 | 13 | -1 | 2 | 2 | ... | 25.93 | 58.47 | 3 | 0 | 4 | 1 | 2 | 11390.0 | 46612 | 21854.981705 |
| 1 | 19 | 2 | 55 | 1.0 | 0.0 | 3 | 14 | -1 | 2 | 2 | ... | 20.42 | 26.57 | 17 | 6 | 4 | 3 | 2 | 11390.0 | 9207 | 18169.604822 |
| 2 | 19 | 2 | 22 | 1.0 | 0.0 | 5 | 13 | 3 | 2 | 2 | ... | 53.12 | 50.33 | 7 | 0 | 1 | 2 | 2 | 18000.0 | 808 | 17191.832515 |
| 3 | 19 | 2 | 2 | 0.0 | 0.0 | 6 | -1 | -1 | 3 | 3 | ... | -1.00 | -1.00 | -1 | -1 | -1 | 2 | 2 | 385.0 | 2721 | 20261.485463 |
| 4 | 19 | 3 | 25 | 1.0 | 0.0 | 1 | 14 | -1 | 2 | 2 | ... | 59.89 | 45.91 | 9 | 2 | 1 | 3 | 1 | 3700.0 | 1573 | 7620.222014 |
| 5 | 19 | 3 | 48 | 0.0 | 0.0 | 1 | 16 | -1 | 2 | 2 | ... | -1.00 | -1.00 | -1 | -1 | 1 | 5 | 1 | 85000.0 | 432 | 13019.508635 |
| 6 | 19 | 4 | 31 | 0.0 | 1.0 | 5 | 14 | -1 | 2 | 2 | ... | 56.71 | 62.39 | 0 | 0 | 1 | 3 | 1 | 24000.0 | 413 | 3018.208554 |
| 7 | 19 | 1 | 37 | 0.0 | 0.0 | 1 | 15 | -1 | 2 | 2 | ... | 42.91 | 58.76 | 0 | 0 | 1 | 5 | 1 | 56052.0 | 693 | 18017.598727 |
| 8 | 19 | 1 | 35 | 1.0 | 0.0 | 1 | 16 | -1 | 2 | 2 | ... | 54.30 | 43.43 | 4 | 1 | 1 | 5 | 1 | 56052.0 | 5692 | 17508.950341 |
| 9 | 19 | 1 | 5 | 0.0 | 0.0 | 6 | 1 | -1 | 3 | 3 | ... | -1.00 | -1.00 | -1 | -1 | -1 | 5 | 1 | 0.0 | 301 | 18158.487104 |
10 rows × 46 columns
Data description count 18350.000000 mean 19.529264 std 0.499156 min 19.000000 25% 19.000000 50% 20.000000 75% 20.000000 max 20.000000 Name: PANEL, dtype: float64 count 18350.000000 mean 2.607466 std 0.942848 min 1.000000 25% 2.000000 50% 3.000000 75% 3.000000 max 4.000000 Name: REGION, dtype: float64 count 18350.000000 mean 38.746649 std 23.020492 min 0.000000 25% 19.000000 50% 38.500000 75% 57.000000 max 85.000000 Name: AGE31X, dtype: float64 count 18350.000000 mean 0.521526 std 0.499550 min 0.000000 25% 0.000000 50% 1.000000 75% 1.000000 max 1.000000 Name: GENDER, dtype: float64 count 18350.000000 mean 0.338147 std 0.473092 min 0.000000 25% 0.000000 50% 0.000000 75% 1.000000 max 1.000000 Name: RACE3, dtype: float64 count 18350.000000 mean 3.590954 std 2.262703 min 1.000000 25% 1.000000 50% 5.000000 75% 5.000000 max 10.000000 Name: MARRY31X, dtype: float64 count 18350.000000 mean 9.842943 std 6.226279 min -1.000000 25% 2.000000 50% 13.000000 75% 14.000000 max 16.000000 Name: EDRECODE, dtype: float64 count 18350.000000 mean -0.759619 std 0.855099 min -1.000000 25% -1.000000 50% -1.000000 75% -1.000000 max 3.000000 Name: FTSTU31X, dtype: float64 count 18350.000000 mean 2.638692 std 0.813550 min 1.000000 25% 2.000000 50% 2.000000 75% 3.000000 max 4.000000 Name: ACTDTY31, dtype: float64 count 18350.000000 mean 2.156948 std 0.523573 min 1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 4.000000 Name: HONRDC31, dtype: float64 count 18350.000000 mean 2.177929 std 1.095924 min -1.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 5.000000 Name: RTHLTH31, dtype: float64 count 18350.000000 mean 1.932316 std 1.021325 min -1.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 5.000000 Name: MNHLTH31, dtype: float64 count 18350.000000 mean 1.031771 std 1.176932 min -1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 2.000000 Name: HIBPDX, dtype: float64 count 18350.000000 mean 1.277384 std 1.246940 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: CHDDX, dtype: float64 count 18350.000000 mean 1.302125 std 1.251105 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: ANGIDX, dtype: float64 count 18350.000000 mean 1.288447 std 1.248865 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: MIDX, dtype: float64 count 18350.000000 mean 1.223215 std 1.236044 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: OHRTDX, dtype: float64 count 18350.000000 mean 1.284687 std 1.248222 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: STRKDX, dtype: float64 count 18350.000000 mean 1.303215 std 1.251277 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: EMPHDX, dtype: float64 count 18350.000000 mean 1.275368 std 1.267648 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: CHBRON31, dtype: float64 count 18350.000000 mean 1.078093 std 1.194322 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: CHOLDX, dtype: float64 count 18350.000000 mean 1.233297 std 1.238259 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: CANCERDX, dtype: float64 count 18350.000000 mean 1.236785 std 1.239005 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: DIABDX, dtype: float64 count 18350.000000 mean 0.977657 std 1.176663 min -1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 2.000000 Name: JTPAIN31, dtype: float64 count 18350.000000 mean 1.084687 std 1.196631 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: ARTHDX, dtype: float64 count 18350.000000 mean -0.208174 std 1.460262 min -1.000000 25% -1.000000 50% -1.000000 75% -1.000000 max 3.000000 Name: ARTHTYPE, dtype: float64 count 18350.000000 mean 1.883106 std 0.330335 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: ASTHDX, dtype: float64 count 18350.000000 mean -0.451989 std 1.136285 min -1.000000 25% -1.000000 50% -1.000000 75% -1.000000 max 2.000000 Name: ADHDADDX, dtype: float64 count 18350.00000 mean -0.44812 std 1.15316 min -1.00000 25% -1.00000 50% -1.00000 75% -1.00000 max 2.00000 Name: PREGNT31, dtype: float64 count 18350.000000 mean 1.865014 std 0.355783 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: WLKLIM31, dtype: float64 count 18350.000000 mean 1.716349 std 0.750265 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: ACTLIM31, dtype: float64 count 18350.000000 mean 1.934877 std 0.265885 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: SOCLIM31, dtype: float64 count 18350.000000 mean 1.241744 std 1.261227 min -1.000000 25% 1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: COGLIM31, dtype: float64 count 18350.000000 mean 1.921308 std 0.364688 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: DFHEAR42, dtype: float64 count 18350.000000 mean 1.937602 std 0.344966 min -1.000000 25% 2.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: DFSEE42, dtype: float64 count 18350.000000 mean 0.900926 std 1.355978 min -1.000000 25% -1.000000 50% 2.000000 75% 2.000000 max 2.000000 Name: ADSMOK42, dtype: float64 count 18350.000000 mean 32.348196 std 25.021569 min -9.000000 25% -1.000000 50% 43.310000 75% 55.090000 max 71.060000 Name: PCS42, dtype: float64 count 18350.000000 mean 34.368819 std 25.934907 min -9.000000 25% -1.000000 50% 46.735000 75% 57.060000 max 74.980000 Name: MCS42, dtype: float64 count 18350.000000 mean 1.664687 std 4.106635 min -9.000000 25% -1.000000 50% 0.000000 75% 3.000000 max 24.000000 Name: K6SUM42, dtype: float64 count 18350.000000 mean 0.136948 std 1.329289 min -1.000000 25% -1.000000 50% 0.000000 75% 0.000000 max 6.000000 Name: PHQ242, dtype: float64 count 18350.000000 mean 1.526376 std 1.842521 min -1.000000 25% 1.000000 50% 1.000000 75% 4.000000 max 4.000000 Name: EMPST31, dtype: float64 count 18350.000000 mean 3.510627 std 1.461804 min 1.000000 25% 3.000000 50% 4.000000 75% 5.000000 max 5.000000 Name: POVCAT15, dtype: float64 count 18350.000000 mean 1.446921 std 0.624748 min 1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 3.000000 Name: INSCOV15, dtype: float64 count 18350.000000 mean 27853.695313 std 36225.013969 min 0.000000 25% 0.000000 50% 16200.000000 75% 40000.000000 max 320299.000000 Name: INCOME_M, dtype: float64 count 18350.000000 mean 5184.511608 std 15126.748532 min 0.000000 25% 198.000000 50% 1034.000000 75% 4219.500000 max 659952.000000 Name: HEALTHEXP, dtype: float64 count 18350.000000 mean 11991.877066 std 9405.954874 min 0.000000 25% 5470.448863 50% 9867.351501 75% 15662.855181 max 98103.984953 Name: PERSONWT, dtype: float64
Categorical variables: ['HIBPDX', 'CHDDX', 'ANGIDX', 'MIDX', 'OHRTDX', 'STRKDX', 'EMPHDX', 'CHBRON31', 'CHOLDX', 'CANCERDX', 'DIABDX', 'JTPAIN31', 'ARTHDX', 'ASTHDX', 'ADHDADDX', 'PREGNT31', 'WLKLIM31', 'ACTLIM31', 'SOCLIM31', 'COGLIM31', 'DFHEAR42', 'DFSEE42', 'ADSMOK42']
XGB results: training rmse: 1.977698616711458 training r2: 0.48684795508177536 training mae: 1.4688560597283682 test rmse: 2.1665237944616744 test r2: 0.37313414322890304 test mae: 1.6149617997637638
Lasso Regression results: training rmse: 2.530774417382313 training r2: 0.15970318823228213 training mae: 1.9013310836499353 test rmse: 2.4984493904113023 test r2: 0.1663403093243384 test mae: 1.862761172732874
Preparation of a new explainer is initiated -> data : 14680 rows 44 cols -> target variable : Argument 'y' was a pandas.Series. Converted to a numpy.ndarray. -> target variable : 14680 values -> model_class : xgboost.sklearn.XGBRegressor (default) -> label : MEPS -> predict function : <function yhat_default at 0x7f9148a5e670> will be used (default) -> predicted values : min = -0.8554313, mean = 5.7083073, max = 10.244568 -> residual function : difference between y and yhat (default) -> residuals : min = -7.963533878326416, mean = 0.0019564959693102344, max = 6.395156118316233 -> model_info : package xgboost A new explainer has been created!
Patien no: 3639 , prediction value: 6.5051827 true value: [6.50884896]
Calculating ceteris paribus!: 100%|██████████| 44/44 [00:00<00:00, 148.77it/s]
Calculating ceteris paribus!: 14%|█▎ | 6/44 [00:00<00:00, 51.80it/s]
Patien no: 975 , prediction value: 0.00083711743 true value: [0.]
Calculating ceteris paribus!: 100%|██████████| 44/44 [00:00<00:00, 47.70it/s]
Calculating ceteris paribus!: 25%|██▌ | 11/44 [00:00<00:00, 107.00it/s]
Patien no: 896 , prediction value: 8.826893 true value: [8.82324185]
Calculating ceteris paribus!: 100%|██████████| 44/44 [00:00<00:00, 122.92it/s]
Creating explainer for linear model Preparation of a new explainer is initiated -> data : 14680 rows 44 cols -> target variable : Argument 'y' was a pandas.Series. Converted to a numpy.ndarray. -> target variable : 14680 values -> model_class : sklearn.linear_model._coordinate_descent.Lasso (default) -> label : MEPS -> predict function : <function yhat_default at 0x7f9148a5e670> will be used (default) -> predicted values : min = 3.251906166715444, mean = 5.710264253185743, max = 9.922703669439233 -> residual function : difference between y and yhat (default) -> residuals : min = -9.922703669439233, mean = -1.8586403987185183e-16, max = 6.282237531056754 -> model_info : package sklearn A new explainer has been created!
Calculating ceteris paribus!: 16%|█▌ | 7/44 [00:00<00:00, 69.92it/s]
Patien no: 704 , prediction value: 8.046038 true value: [8.04268481]
Calculating ceteris paribus!: 100%|██████████| 44/44 [00:00<00:00, 104.69it/s]
Calculating ceteris paribus!: 50%|█████ | 22/44 [00:00<00:00, 209.97it/s]
Patien no: 704 , prediction value: 7.507052624767924 true value: [8.04268481]
Calculating ceteris paribus!: 100%|██████████| 44/44 [00:00<00:00, 195.91it/s]